Broadcast on Clusters of SMPs with Optimal Concurrency
نویسندگان
چکیده
Broadcast is an important collective communication for either users’ programs or the underlying high-performance computing platforms. In this paper, we present a hierarchical method for broadcast over clusters of SMPs (CSMPs) connected by switches under the one-port model. A broadcast over CSMPs consists of three levels, one for a sub-broadcast in each SMP node, one for a sub-broadcast within each of switches called intra-switch broadcast, and one for a sub-broadcast among switches called interswitch broadcast. Regarding high-performance of switches, our concern focuses on interswitch broadcasts. The new inter-switch broadcast is based on Single-Source Shortest path Minimum-cost Spanning Tree (SSS-MST). In general, a broadcast over an SSSMST may not outperform due to sequential effects arisen from poor usage of bandwidth per link and nodes having receiving broadcasted messages under the one-port model. In our new algorithm, we obtain the optimal concurrency in each step of broadcasts such that as many messages as possible are forwarded simultaneously in each step of SSSMST with the optimal number of steps and minimum cost. The two heuristics, from-upto-down and from-down-to-up algorithms, are proposed to obtain this maximum concurrency using the static topological information and costs for links. Additionally, for regular programs a local update technique is applied to adapt to the dynamic changes in topology and available bandwidth per link over the underlying interconnects.
منابع مشابه
Task Pool Teams: a hybrid programming environment for irregular algorithms on SMP clusters
Clusters of SMPs (symmetric multiprocessors) are popular platforms for parallel programming since they provide large computational power for a reasonable price. For irregular application programs with dynamically changing computation and data access behavior a flexible programming model is needed to achieve efficiency. In this paper we propose Task Pool Teams as a hybrid parallel programming en...
متن کاملCOMPaS: a PC-based SMP cluster
82 IEEE Concurrency R symmetric multiprocessor systems have become widely available, both as computational servers and as platforms for high-performance parallel computing. The same trend is found in PCs—some have also become SMPs with many CPUs in one box. Clusters of PC-based SMPs are expected to be compact, cost-effective parallel-computing platforms. At the Real World Computing Partnership,...
متن کاملA high-productivity task-based programming model for clusters
Programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on automatic function-level parallelism that targets productivity. StarSs deploys a data-flow model: it analyzes dependencies between tasks and manages their execution, exploiting thei...
متن کاملOpenUH: an optimizing, portable OpenMP compiler
OpenMP has gained wide popularity as an API for parallel programming on shared memory and distributed shared memory platforms. Despite its broad availability, there remains a need for a portable, robust, open source, optimizing OpenMP compiler for C/C++/Fortran 90, especially for teaching and research, e.g. into its use on new target architectures, such as SMPs with chip multithreading, as well...
متن کاملJavaParty - Transparent Remote Objects in Java
Java’s threads offer appropriate means either for parallel programming of SMPs or as target constructs when compiling add-on features (e.g. forall constructs, automatic parallelization, etc.) Unfortunately, Java does not provide elegant and straightforward mechanisms for parallel programming on distributed memory machines, like clusters of workstations. JavaParty transparently adds remote objec...
متن کامل